-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[exporter/loadbalancing] Adding AWS Cloud Map for Load Balancer Exporter discovery #27588
[exporter/loadbalancing] Adding AWS Cloud Map for Load Balancer Exporter discovery #27588
Conversation
436e611
to
56a9501
Compare
10bedf5
to
00aa631
Compare
@@ -53,7 +53,7 @@ This also supports service name based exporting for traces. If you have two or m | |||
Refer to [config.yaml](./testdata/config.yaml) for detailed examples on using the processor. | |||
|
|||
* The `otlp` property configures the template used for building the OTLP exporter. Refer to the OTLP Exporter documentation for information on which options are available. Note that the `endpoint` property should not be set and will be overridden by this exporter with the backend endpoint. | |||
* The `resolver` accepts a `static` node, a `dns` or a `k8s` service. If all three are specified, `k8s` takes precedence. | |||
* The `resolver` accepts a `static` node, a `dns`, a `k8s` service or `awsCloudMap`. If all four are specified, `k8s` takes precedence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO:
Define what to do in this case since currently when static
and dns
are defined, then throws a errMultipleResolversProvided
opentelemetry-collector-contrib/exporter/loadbalancingexporter/loadbalancer.go
Lines 55 to 57 in 0efc53a
if oCfg.Resolver.DNS != nil && oCfg.Resolver.Static != nil { | |
return nil, errMultipleResolversProvided | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to return an error, and users should have this expectation as well. Can you please change this line in the doc to reflect what the code does?
9575682
to
caa3636
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Thank you for your PR. @Aneurysm9, @bryan-aguilar, would someone from AWS be available to review this PR?
@@ -53,7 +53,7 @@ This also supports service name based exporting for traces. If you have two or m | |||
Refer to [config.yaml](./testdata/config.yaml) for detailed examples on using the processor. | |||
|
|||
* The `otlp` property configures the template used for building the OTLP exporter. Refer to the OTLP Exporter documentation for information on which options are available. Note that the `endpoint` property should not be set and will be overridden by this exporter with the backend endpoint. | |||
* The `resolver` accepts a `static` node, a `dns` or a `k8s` service. If all three are specified, `k8s` takes precedence. | |||
* The `resolver` accepts a `static` node, a `dns`, a `k8s` service or `awsCloudMap`. If all four are specified, `k8s` takes precedence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to return an error, and users should have this expectation as well. Can you please change this line in the doc to reflect what the code does?
* `namespace` The CloudMap namespace where the service is register, e.g. `cloudmap`. If no `namespace` is specified, this will fail to start the Load Balancer exporter. | ||
* `serviceName` The name of the service that you specified when you registered the instance, e.g. `otelcollectors`. If no `serviceName` is specified, this will fail to start the Load Balancer exporter. | ||
* `interval` resolver interval in go-Duration format, e.g. `5s`, `1d`, `30m`. If not specified, `30s` will be used. | ||
* `timeout` resolver timeout in go-Duration format, e.g. `5s`, `1d`, `30m`. If not specified, `1s` will be used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 1s a reasonable value here? If the default interval is 30s, the timeout can certainly be at around 5s, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, makes all the sense, I've updated this property and documentation
@@ -63,6 +63,12 @@ Refer to [config.yaml](./testdata/config.yaml) for detailed examples on using th | |||
* The `k8s` node accepts the following optional properties: | |||
* `service` Kubernetes service to resolve, e.g. `lb-svc.lb-ns`. If no namespace is specified, an attempt will be made to infer the namespace for this collector, and if this fails it will fall back to the `default` namespace. | |||
* `ports` port to be used for exporting the traces to the addresses resolved from `service`. If `ports` is not specified, the default port 4317 is used. When multiple ports are specified, two backends are added to the load balancer as if they were at different pods. | |||
* The `awsCloudMap` node accepts the following properties: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know nothing about AWS Cloud Map, is there anything users have to be aware before configuring it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only relevant topic I can think of is that usage of the SDK hits the API every polling interval, so users have to take into account the AWS API Rate limits in their configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this comment be added to the readme? Or is it so obvious that it doesn't have to?
* `serviceName` The name of the service that you specified when you registered the instance, e.g. `otelcollectors`. If no `serviceName` is specified, this will fail to start the Load Balancer exporter. | ||
* `interval` resolver interval in go-Duration format, e.g. `5s`, `1d`, `30m`. If not specified, `30s` will be used. | ||
* `timeout` resolver timeout in go-Duration format, e.g. `5s`, `1d`, `30m`. If not specified, `1s` will be used. | ||
* `port` port to be used for exporting the traces to the addresses resolved from `service`. By default, the port is set in Cloud Map, but can be be overridden with a static value in this config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a healthStatus
property as well. Should it be documented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this property, in this initial iteration, I'd rather to leave this config as default since we filter only for Healthy instances
Reference: https://docs.aws.amazon.com/cloud-map/latest/api/API_GetInstancesHealthStatus.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be missing something, but the value the user is providing is indeed being used, and therefore, deserves to be documented.
@@ -52,7 +52,20 @@ type loadBalancerImp struct { | |||
func newLoadBalancer(params exporter.CreateSettings, cfg component.Config, factory componentFactory) (*loadBalancerImp, error) { | |||
oCfg := cfg.(*Config) | |||
|
|||
if oCfg.Resolver.DNS != nil && oCfg.Resolver.Static != nil { | |||
var count = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works, and I like that it provides a short-circuit feature to the function. But I propose two alternatives:
- externalize this into a
func checkMultipleResolversProvided(Config) error
, and call this function here. It should make your code easier to test and leave the existing function more readable. - alternatively, you can create a boolean
configured
, and set it to true whenever a resolver is not nil. On the next resolver, you check if configured is true and return the errMultipleResolversProvided error. You won't have the fail-fast behavior, though.
|
||
func (r *cloudMapResolver) start(ctx context.Context) error { | ||
if _, err := r.resolve(ctx); err != nil { | ||
r.logger.Warn("Failed initial resolve", zap.Error(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r.logger.Warn("Failed initial resolve", zap.Error(err)) | |
r.logger.Warn("failed initial resolve", zap.Error(err)) |
|
||
_ = stats.RecordWithTags(ctx, awsResolverSuccessTrueMutators, mNumResolutions.M(1)) | ||
|
||
r.logger.Debug("DiscoverInstance", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this deserves a better message.
} else { | ||
endpoint = fmt.Sprintf("%s:%d", ipAddr, *r.port) | ||
} | ||
r.logger.Debug("Resolved instance", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r.logger.Debug("Resolved instance", | |
r.logger.Debug("resolved instance", |
62747e7
to
e7f13f2
Compare
e7f13f2
to
573268f
Compare
Hey, this looks great. Can we expect this to be available in the next release? |
|
||
func createDiscoveryFunction(client *servicediscovery.Client) func(params *servicediscovery.DiscoverInstancesInput) (*servicediscovery.DiscoverInstancesOutput, error) { | ||
return func(params *servicediscovery.DiscoverInstancesInput) (*servicediscovery.DiscoverInstancesOutput, error) { | ||
return client.DiscoverInstances(context.TODO(), params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was brought to my attention that this returns "only" the first 100 hosts, and that the results to this API are paginated. Can you handle this in this PR? If you prefer to handle that in a future PR, please open an issue and leave a reference to it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this specific case, I'll rather add this pagination as a new iteration.
Will raise the issue as requested.
Thanks for the review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the issue raised described to handle this limitation. #29771
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to know whether there are more than 100 hosts available? If so, perhaps it could warn a message in the logs?
@@ -198,3 +208,27 @@ func (lb *loadBalancerImp) Exporter(endpoint string) (component.Component, error | |||
|
|||
return exp, nil | |||
} | |||
|
|||
func checkMultipleResolversProvided(cfg component.Config) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely comfortable with this change in this PR. I think it has to be on its own PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll raise a new PR once this is merge with the increment.
a065ed4
to
64f1eaf
Compare
@andretong, once this is ready for another review, let me know but note that I'm unable to do so this year. I'm back on Jan 15, please ping me again on Jan 18 if I haven't reviewed this here by then. |
64f1eaf
to
2cf2d4b
Compare
@jpkrohling thanks for the notice. |
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
bee7b81
to
840f2ef
Compare
@jpkrohling I think I've fixed the errors, can you run the pipeline please? I've check locally and the distribution is built when running |
51cb8ef
to
c9776bf
Compare
|
a2fa3b8
to
70eddfa
Compare
@jpkrohling seems is OK but a need rebase was needed :( |
@jpkrohling all checks have pass :D |
70eddfa
to
0ac5581
Compare
Merging based on #27588 (comment) cc @jpkrohling |
Description:
Implementation of 27241
Adding AWS Service discovery system (CloudMap) support to resolve the Collector's Backend.
This would allow users to use this exporter when there is the case of using ECS over EKS in an AWS infrastructure.
Link to tracking Issue:
Testing:
Setting the following configuration with the proper values, we can start the Load Balancer which will connect to the AWS CloudMap service and start sending traces to the available collectors.
Documentation: